fix: exclude submodules outside src folder when src is specified#1256
Conversation
When a subproject specifies a src directory, git submodule update --init would clone ALL submodules (including those outside src), leaving their directories in the destination. _apply_src_and_ignore now detects which submodules are outside src (path unchanged by strip_glob_prefix), removes their top-level directory, and excludes them from the returned list. Two new BDD scenarios cover this: - plain src directory (non-glob) with a submodule inside - mixed submodules inside and outside src, only inside is fetched https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughCentralize src-aware submodule filtering into ChangesSubmodule filtering when src is specified
Sequence Diagram(s) sequenceDiagram
participant Apply as GitLocalRepo._apply_src_and_ignore
participant Filter as _filter_submodules_by_src
participant Strip as strip_glob_prefix
participant SafeRM as safe_rm / scheduled removals
participant Move as _move_src_folder_up
Apply->>Filter: call(remote, src, submodules)
loop per submodule
Filter->>Strip: compute rewritten path under src
alt rewritten path within src
Filter->>Filter: update submodule.path (keep in within_src)
else exact-match src
Filter->>Filter: keep entry unchanged for promotion
else outside src
Filter->>SafeRM: schedule and remove path
end
end
Filter->>SafeRM: remove now-empty parent dirs after deletions
Filter->>Move: promote src folder to repo root
Filter-->>Apply: return within_src (kept submodules)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dfetch/vcs/git.py`:
- Line 430: Collect unique top-level directories before calling safe_rm to avoid
redundant removals: instead of calling safe_rm(Path(submodule.path).parts[0],
within=".") for each submodule, build a set of Path(submodule.path).parts[0] (or
equivalent) for all relevant submodules and then iterate that set to call
safe_rm once per unique top-level directory; update the code around the existing
safe_rm invocation in dfetch.vcs.git to use this deduplicated collection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0f0542b2-6e51-4798-a28b-1f31d2fac6b6
📒 Files selected for processing (2)
dfetch/vcs/git.pyfeatures/fetch-git-repo-with-submodule.feature
- Extract _filter_submodules_by_src from _apply_src_and_ignore to keep cyclomatic complexity within xenon's block-grade-B limit - Fix mypy strict error in log.py: replace LogRender inheritance with composition to avoid subclassing an untyped (Any) class - Fix mypy strict error in license.py: unpack probable_licenses[0] explicitly so mypy sees exactly two positional args to from_inferred, eliminating the spurious "multiple values for keyword argument 'text'" https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
When multiple submodules outside src share the same top-level directory, collect unique top-level dirs first and call safe_rm once per dir instead of once per submodule. https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dfetch/log.py (1)
67-69:⚠️ Potential issue | 🟡 MinorAvoid mutating RichHandler private
_log_renderFile:
dfetch/log.py
Lines: 67-69The below code mutates RichHandler’s private rendering internals:
handler._log_render = _NoExpandLogRender( # pylint: disable=protected-access show_time=False, show_level=False, show_path=False )Rich doesn’t provide a public API for customizing
_log_render; consider refactoring to a customRichHandlersubclass that overridesrender(oremit) to apply the non-expanding table behavior (avoiding theprotected-accesssuppression), or document why no supported hook fits this asciinema-specific requirement.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dfetch/log.py` around lines 67 - 69, The code is mutating RichHandler's private attribute handler._log_render by assigning _NoExpandLogRender; instead create a custom subclass of RichHandler (e.g., class NoExpandRichHandler(RichHandler)) and implement an override of render (or emit) to return/apply the non-expanding table behavior used by _NoExpandLogRender, then instantiate that subclass instead of assigning to handler._log_render; replace usages that create a RichHandler with the new NoExpandRichHandler and remove the protected-access suppression and direct assignment to _log_render in dfetch/log.py.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dfetch/vcs/git.py`:
- Around line 442-447: The code currently treats submodules whose path exactly
equals src as outside scope because strip_glob_prefix returns the same path;
modify the logic in the block around strip_glob_prefix so you first check if
submodule.path == src and, if so, treat that submodule as in-scope by appending
it to within_src (and do NOT add it to to_remove or change its path), otherwise
continue using new_path != submodule.path to decide whether to update
submodule.path and append to within_src or add the top-level segment to
to_remove; reference functions/vars: strip_glob_prefix, submodule.path, src,
within_src, to_remove.
- Around line 447-450: The code currently adds only the first path component
(Path(submodule.path).parts[0]) to to_remove which can accidentally remove
shared ancestors like "apps" when src is nested; instead add the excluded
submodule's full path or only an ancestor proven disjoint from src. Replace the
parts[0] usage so you collect the exact excluded path (e.g. Path(submodule.path)
as a string) or, if you must remove an ancestor, test that
Path(submodule.path).parts does not share the src prefix (compare
Path(src).parts) before adding it to to_remove; then call safe_rm() with that
exact path(s) and leave _move_src_folder_up(remote, src) unchanged.
---
Outside diff comments:
In `@dfetch/log.py`:
- Around line 67-69: The code is mutating RichHandler's private attribute
handler._log_render by assigning _NoExpandLogRender; instead create a custom
subclass of RichHandler (e.g., class NoExpandRichHandler(RichHandler)) and
implement an override of render (or emit) to return/apply the non-expanding
table behavior used by _NoExpandLogRender, then instantiate that subclass
instead of assigning to handler._log_render; replace usages that create a
RichHandler with the new NoExpandRichHandler and remove the protected-access
suppression and direct assignment to _log_render in dfetch/log.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c3237f98-7a13-4e72-a321-53b99878da87
📒 Files selected for processing (3)
dfetch/log.pydfetch/util/license.pydfetch/vcs/git.py
Composition broke CI because handler._log_render is typed as LogRender in rich's stubs there, making the assignment incompatible. Revert to inheritance; use type: ignore[misc] to suppress the local-only error where rich._log_render has no stubs and LogRender resolves to Any. warn_unused_ignores=false means the comment is harmless in CI. https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
git.py – _filter_submodules_by_src: - Issue 1: submodule whose path exactly equals src was treated as outside-scope because strip_glob_prefix returns unchanged when path depth <= pattern depth. Now checked first; submodule is added to within_src without altering its path so _move_src_folder_up can promote it correctly. - Issue 2: using parts[0] as the ancestor to remove could clobber the src tree when src is nested (e.g. src: apps/myapp, submodule: apps/other would incorrectly remove apps/). Now uses Path(src).is_relative_to() to detect a shared ancestor and falls back to the exact submodule path; the shared ancestor is cleaned up automatically by _move_src_folder_up. log.py – _NoExpandRichHandler: Replace _NoExpandLogRender (which mutated handler._log_render, a protected attribute) with a proper RichHandler subclass that overrides the public render() method to set table.expand = False. configure_root_logger now selects the subclass via handler_class rather than post-construction mutation, removing the protected-access suppression entirely. https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
dfetch/vcs/git.py (1)
452-463:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't remove a submodule path that still contains the requested
src.With
src="apps/foo"and a top-level submodule atapps,strip_glob_prefix("apps", "apps/foo")stays unchanged, so Line 458 takes the shared-ancestor branch and Line 459 queuessafe_rm("apps"). That deletes the only tree containingapps/foobefore_move_src_folder_up()runs, so the requested subtree is lost. The ancestor check needs to use the fullsubmodule.path, and paths that are ancestors ofsrcshould be skipped here instead of removed.Suggested fix
else: - sub_top = Path(submodule.path).parts[0] + src_path = Path(src) + sub_path = Path(submodule.path) + sub_top = sub_path.parts[0] # Only remove the top-level component when it is provably disjoint # from src; if src lives under that same ancestor, removing it would # clobber the src tree before _move_src_folder_up can promote it. # In that case use the exact submodule path; the shared ancestor is # cleaned up automatically by _move_src_folder_up. - if Path(src).is_relative_to(sub_top): + if src_path.is_relative_to(sub_path): + continue + if src_path.is_relative_to(sub_top): to_remove.add(submodule.path) else: to_remove.add(sub_top)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dfetch/vcs/git.py` around lines 452 - 463, The current ancestor check uses only sub_top and thus can remove a top-level submodule that actually contains the requested src; change the logic to test whether src is inside the full submodule.path and skip removal in that case: instead of if Path(src).is_relative_to(sub_top): ... use if Path(src).is_relative_to(Path(submodule.path)): continue (or skip adding to to_remove) so that when src is a descendant of submodule.path you do not add either submodule.path or sub_top to to_remove; otherwise keep the existing behavior that adds submodule.path (or sub_top) and later calls safe_rm, and retain reference to _move_src_folder_up for the remaining cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dfetch/log.py`:
- Line 26: Remove the inline suppressions from the class declaration of
_NoExpandRichHandler: delete the trailing "# type: ignore[misc]" and "# pylint:
disable=too-few-public-methods" comments on the class line that defines
_NoExpandRichHandler (which subclasses RichHandler), leaving a plain class
declaration; ensure no other code changes are made and run type/pylint checks
(note that the pylint warning applies to ExtLogFilter, not
_NoExpandRichHandler).
---
Duplicate comments:
In `@dfetch/vcs/git.py`:
- Around line 452-463: The current ancestor check uses only sub_top and thus can
remove a top-level submodule that actually contains the requested src; change
the logic to test whether src is inside the full submodule.path and skip removal
in that case: instead of if Path(src).is_relative_to(sub_top): ... use if
Path(src).is_relative_to(Path(submodule.path)): continue (or skip adding to
to_remove) so that when src is a descendant of submodule.path you do not add
either submodule.path or sub_top to to_remove; otherwise keep the existing
behavior that adds submodule.path (or sub_top) and later calls safe_rm, and
retain reference to _move_src_folder_up for the remaining cleanup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eb36f474-7b8e-4742-8aef-8e4d8bc0002b
📒 Files selected for processing (2)
dfetch/log.pydfetch/vcs/git.py
…submodule ancestor check; add unit tests
- log.py: remove `pylint: disable=too-few-public-methods` from _NoExpandRichHandler (it
inherits many methods from RichHandler so pylint never flags it); retain `type: ignore[misc]`
which mypy requires because rich lacks complete type stubs.
- vcs/git.py: in _filter_submodules_by_src replace the fragile two-branch
`sub_top` guard with a single `Path(src).is_relative_to(Path(submodule.path))`
check. The old code computed the top-level path component separately and
could mistakenly remove a top-level submodule whose full path is actually an
ancestor of src (e.g. submodule at "apps", src="apps/myapp"). The new code
tests the full path and skips removal with `continue`, leaving the ancestor
submodule in place so _move_src_folder_up can promote its content correctly.
- tests/test_git_vcs.py: add two new unit tests for _filter_submodules_by_src:
- test_filter_submodules_ancestor_of_src_not_removed: proves that a submodule
at "apps" is not deleted when src="apps/myapp" and that the inner content is
promoted to root.
- test_filter_submodules_disjoint_submodule_removed: proves that a truly
out-of-scope submodule (different top-level dir) is still removed.
https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
♻️ Duplicate comments (2)
dfetch/log.py (1)
26-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove inline type suppression and fix the underlying typing mismatch.
The inline
# type: ignore[misc]on_NoExpandRichHandlerviolates the repository rule for production code suppressions.Suggested fix
-class _NoExpandRichHandler(RichHandler): # type: ignore[misc] +class _NoExpandRichHandler(RichHandler):As per coding guidelines, “Avoid inline lint suppressions (
# noqa,# type: ignore,# pylint: disable,# pyright: ignore) without fixing the root cause; exception permitted for module-level tool headers (lines 1–5) in test files.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dfetch/log.py` at line 26, Remove the inline "# type: ignore[misc]" on the class declaration for _NoExpandRichHandler and fix the underlying typing mismatch: make _NoExpandRichHandler(RichHandler) conform to RichHandler's type signature by adjusting the class' base/generic parameters or by adding the missing/incorrect type annotations on its methods (e.g., __init__, emit, or any overridden methods) so the subclass matches the parent's expected signatures; ensure imports for any typing primitives (TypeVar, Generic, Any, Optional) are added and not suppressed so the typechecker accepts the class without an inline ignore.Source: Coding guidelines
dfetch/vcs/git.py (1)
452-456:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winTop-level deletion can still remove the selected
srctree.
to_remove.add(Path(submodule.path).parts[0])on Line 454 is unsafe when excluded submodules share the same top-level directory assrc(e.g.,apps/otherwithsrc=apps/myapp), because deletingappsremoves in-scope content before_move_src_folder_up.Suggested fix
else: if Path(src).is_relative_to(Path(submodule.path)): continue - to_remove.add(Path(submodule.path).parts[0]) + src_top = Path(src).parts[0] if Path(src).parts else "" + sub_path = Path(submodule.path) + sub_top = sub_path.parts[0] if sub_path.parts else "" + to_remove.add(submodule.path if sub_top == src_top else sub_top)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dfetch/vcs/git.py` around lines 452 - 456, The current logic collects only the top-level directory (Path(submodule.path).parts[0]) which can delete a sibling tree that contains the target src (e.g., apps); instead, collect and remove the exact submodule paths and not their top-level parent: change the to_remove set to hold full Path(submodule.path) values (keep the existing check using src and submodule.path) and call safe_rm on each full submodule path (not the top-level part), referencing variables/functions submodule.path, src, to_remove, Path(), safe_rm and the surrounding _move_src_folder_up flow so the removals only delete the precise submodule dirs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@dfetch/log.py`:
- Line 26: Remove the inline "# type: ignore[misc]" on the class declaration for
_NoExpandRichHandler and fix the underlying typing mismatch: make
_NoExpandRichHandler(RichHandler) conform to RichHandler's type signature by
adjusting the class' base/generic parameters or by adding the missing/incorrect
type annotations on its methods (e.g., __init__, emit, or any overridden
methods) so the subclass matches the parent's expected signatures; ensure
imports for any typing primitives (TypeVar, Generic, Any, Optional) are added
and not suppressed so the typechecker accepts the class without an inline
ignore.
In `@dfetch/vcs/git.py`:
- Around line 452-456: The current logic collects only the top-level directory
(Path(submodule.path).parts[0]) which can delete a sibling tree that contains
the target src (e.g., apps); instead, collect and remove the exact submodule
paths and not their top-level parent: change the to_remove set to hold full
Path(submodule.path) values (keep the existing check using src and
submodule.path) and call safe_rm on each full submodule path (not the top-level
part), referencing variables/functions submodule.path, src, to_remove, Path(),
safe_rm and the surrounding _move_src_folder_up flow so the removals only delete
the precise submodule dirs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: caf8e1b0-c9fd-4e5f-8f2f-8871a8523c22
📒 Files selected for processing (3)
dfetch/log.pydfetch/vcs/git.pytests/test_git_vcs.py
… handle empty parents and .git collision
The previous fix used Path(submodule.path).parts[0] in to_remove, which for a
sibling submodule (e.g. apps/widget when src=apps/lib) would add the shared
top-level dir 'apps' to to_remove and call safe_rm('apps'), destroying the
already-cloned apps/lib content before _move_src_folder_up could promote it.
Changes:
- vcs/git.py: _filter_submodules_by_src now adds the exact submodule.path to
to_remove instead of its first path component, so only the precise out-of-scope
submodule directory is deleted.
- vcs/git.py: add _remove_empty_parents static helper that walks up from each
removed path and rmdir()s any ancestor directories that become empty. git
submodule update creates parent dirs (e.g. other_ext/) for submodules excluded
by sparse-checkout; the old top-level removal cleaned those up for free, so
the helper restores that behaviour without the sibling-clobbering side-effect.
- vcs/git.py: _apply_move now pre-removes .git and .gitmodules from the root of
the chosen directory before promoting its contents to the repo root. When src
is itself a cloned submodule (e.g. src: apps/lib), the submodule's .git file
would collide with the parent repo's .git directory; removing it beforehand
avoids the shutil.Error. The caller (gitsubproject) already cleans up these
files recursively after checkout completes.
- tests/test_git_vcs.py: update test_filter_submodules_disjoint_submodule_removed
to assert on the exact submodule path (not its parent); add new
test_filter_submodules_sibling_of_src_not_removed which proves the bug — with
the old code safe_rm('apps') would remove apps/lib/ before promotion, so
README.md would never reach root.
- features/fetch-git-repo-with-submodule.feature: add end-to-end scenario
"A sibling submodule at the same top-level dir as src is not fetched" that
fails under the old top-level removal logic and passes with the fix.
https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
The pre-commit mypy binary runs in an isolated uv-tool venv that lacks rich, so RichHandler resolved to Any and triggered [misc] on subclassing. Move the suppression to a targeted per-module mypy override (disallow_subclassing_any = false for dfetch.log) so the inline comment is gone and the setting is explicit in configuration. https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
|
@coderabbitai review |
✅ Action performedReview finished.
|
When a subproject specifies a src directory, git submodule update --init
would clone ALL submodules (including those outside src), leaving their
directories in the destination. _apply_src_and_ignore now detects which
submodules are outside src (path unchanged by strip_glob_prefix), removes
their top-level directory, and excludes them from the returned list.
Two new BDD scenarios cover this:
https://claude.ai/code/session_01KjAnpdH8M33gT1yZPraVx8
Summary by CodeRabbit
Bug Fixes
Tests
Refactor